Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macauff Import Pipeline #185

Closed
wants to merge 1 commit into from
Closed

Macauff Import Pipeline #185

wants to merge 1 commit into from

Conversation

maxwest-uw
Copy link
Contributor

Change Description

#147

  • My PR includes a link to the issue that I am addressing

Solution Description

copartitions based on the left catalog

NOTE: I think we could generalize this function a lot and make it so that we just copartition based on a given catalog and make it an association table.

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

@maxwest-uw maxwest-uw marked this pull request as draft December 6, 2023 00:34
@maxwest-uw maxwest-uw changed the title initial commit Macauff Import Pipeline Dec 6, 2023
self.resume_plan = ResumePlan(
input_paths=self.input_paths,
tmp_path=self.tmp_path,
progress_bar=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
progress_bar=True,
progress_bar=self.progress_bar,

Let's you skip generation of a progress bar in unit tests.

"""instance of input reader that specifies arguments necessary for reading
from your input files"""

constant_healpix_order: int = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need any new healpix orders? We should be using the existing partitioning from the left catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a pile of rows in a csv (or set of csvs) that come out of the macauff package and need to be matched against the partitioning of the left catalog

left_pixels = left_catalog.partition_info.get_healpix_pixels()

# assign a constant healpix order value for the data
_map_pixels(args, client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

dec_column=args.left_dec_column,
id_column=args.left_id_column,
add_hipscat_index=False,
use_schema_file=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the schema parquet file, it would be good to pass it in here.

But I'm not sure that the catalog's reduce_pixel_shards method is the best call here.


alignment = align_trees(
left=left_pixel_tree,
right=cross_match_pixel_tree,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we adding in the join_Norder / join_Npix columns to associate with the right catalog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, that needs to be added as well

@maxwest-uw maxwest-uw closed this Dec 14, 2023
@delucchi-cmu delucchi-cmu deleted the macauff_import_pipeline branch April 2, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants